Skip to content

Add threat model + security-model discoverability (AGENTS.md -> SECURITY.md -> THREAT_MODEL.md)#17823

Open
potiuk wants to merge 4 commits into
apache:masterfrom
potiuk:asf-security/threat-model-2026-06-02
Open

Add threat model + security-model discoverability (AGENTS.md -> SECURITY.md -> THREAT_MODEL.md)#17823
potiuk wants to merge 4 commits into
apache:masterfrom
potiuk:asf-security/threat-model-2026-06-02

Conversation

@potiuk
Copy link
Copy Markdown
Member

@potiuk potiuk commented Jun 2, 2026

What this is

A draft threat model for Apache IoTDB, proposed by the ASF Security team for the IoTDB PMC to review, correct, or reject. It is a starting point for discussion, not a finished document.

This PR:

  • adds THREAT_MODEL.md — the draft model, following the ASF Security threat-model rubric;
  • adds SECURITY.md — a short security policy that links the threat model;
  • appends a ## Security section to the existing AGENTS.md, so the chain AGENTS.md → SECURITY.md → THREAT_MODEL.md is mechanically discoverable by automated security scanners.

How to read it

Every claim is provenance-tagged:

  • (documented) — taken from IoTDB's own docs/repo;
  • (inferred) — reasoned from the architecture, not yet confirmed;
  • (maintainer) — confirmed by the PMC.

This v0 is deliberately inferred-heavy (~14 documented / ~41 inferred). The §14 Open questions section collects every inferred claim into four waves for the PMC to confirm or correct — that is where review time is best spent. The highest-impact ones:

  • deployment posture, and whether the default root:root admin is a supported production posture or a documented must-change (wave 1);
  • whether UDF / Trigger / Pipe / AINode-model server-side code execution is by-design, gated by privilege (wave 3);
  • where the resource / DoS line sits — is an expensive query a bug? (wave 4).

Nothing here is a requirement — the model is for the PMC to own. Comment inline, edit the branch directly, or reply on the email thread; we'll fold in your answers and promote the (inferred) tags as they are confirmed.

Note: in apache/iotdb, AGENTS.md is a symlink to CLAUDE.md, so the ## Security section in this PR lands in CLAUDE.md by design — the AGENTS.md → SECURITY.md → THREAT_MODEL.md discoverability chain resolves through the symlink.

@potiuk
Copy link
Copy Markdown
Member Author

potiuk commented Jun 2, 2026

Heads-up on the red Simple (17) check: it's failing at the actions/checkout@v5 step on the self-hosted runner — fatal: unable to access 'https://github.com/apache/iotdb/': gnutls_handshake() failed: The TLS connection was non-properly terminated (git exit 128) — i.e. before any build runs. That's a runner-side network/TLS issue, not this PR: the change is documentation-only (THREAT_MODEL.md + SECURITY.md + the AGENTS.md/CLAUDE.md Security section) and the rest of CI is green. A re-run reproduced it identically, so the runner likely needs a restart/repair. Flagging so the red isn't read as a defect in the PR.

@HTHou
Copy link
Copy Markdown
Contributor

HTHou commented Jun 3, 2026

Thanks for preparing this. Speaking as an IoTDB PMC member, I think this is a useful v0 draft and a good starting point for the PMC to own and refine. I agree with the approach of keeping inferred claims explicit and promoting them as the PMC confirms or corrects them.

A few points I can confirm or suggest clarifying from the current project behavior/configuration:

  1. Deployment posture: IoTDB should generally be treated as operator-deployed infrastructure. My suggested wording is trusted-network-by-default, with the client RPC surface as the main in-model boundary. Direct public exposure, especially with default credentials, should not be considered a supported production posture.

  2. Default root:root: the default administrator account/password exists for initial setup and local getting-started use. It should be documented as a must-change before production use or exposure outside a trusted environment, not as a supported production posture.

  3. REST/MQTT defaults: both REST and MQTT are disabled by default in the current config:

    • enable_rest_service=false
    • enable_mqtt_service=false
  4. Client Thrift SSL is available but disabled by default:

    • enable_thrift_ssl=false
  5. Extension/server-side execution: USE_UDF, USE_TRIGGER, USE_PIPE, and USE_MODEL are system privileges and are grantable privileges, not strictly root/admin-only operations. My suggested security-model interpretation is that principals granted these privileges are trusted for the corresponding server-side execution capability. RBAC is the authorization boundary here, not a sandbox.

  6. Resource/DoS line: I would distinguish malformed/pre-auth/client input causing crashes, OOMs, deadlocks, or clearly unbounded behavior from ordinary expensive queries or write load. The former should remain in-model security-relevant behavior; the latter is generally an operator capacity/resource-management concern unless there is a specific bug such as super-linear amplification, missing limits where limits are expected, or a hang.

For inter-node trust, Byzantine peer assumptions, and the exact wording of the long-term triage policy, I think it is reasonable to keep them as explicit open questions and settle them through follow-up PMC discussion rather than trying to finalize the whole threat model in this PR.

So overall: I support using this PR as the initial draft, with the current defaults and privilege model above folded into the document where appropriate.

Generated-by: Claude Code
@potiuk
Copy link
Copy Markdown
Member Author

potiuk commented Jun 4, 2026

Thanks @HTHou — pushed a revision folding in your review:

  • Trusted-network-by-default posture, with the authenticated client RPC surface as the main in-model boundary; direct public exposure (esp. with default creds) noted as not a supported posture.
  • Default root:root documented as must-change-before-production (OUT-OF-MODEL: non-default-build), not a supported posture.
  • Defaults reflected: REST off, MQTT off, client Thrift SSL off.
  • USE_UDF / USE_TRIGGER / USE_PIPE / USE_MODEL framed as grantable system privileges — principals holding them are trusted for that server-side execution; RBAC is the boundary, not a sandbox (UDF-RCE = BY-DESIGN).
  • DoS line split: malformed/pre-auth input causing crash/OOM/hang is in-model; ordinary expensive queries / write load are operator capacity (out-of-model unless super-linear amplification / missing-expected-limit / hang).

Per your note, I kept inter-node trust, the Byzantine-peer assumption, and the long-term triage policy as explicit §14 follow-up items rather than finalizing them here. Ready as the initial draft whenever you're set.

Copy link
Copy Markdown
Contributor

@JackieTien97 JackieTien97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @potiuk and the ASF Security team for drafting this. The structure and the provenance tagging are genuinely useful. Below are the PMC's confirmations so the corresponding (inferred) tags can be promoted to (maintainer).

Wave 1 — scope & posture

  1. Deployment posture: Confirmed. IoTDB is intended to be deployed inside a trusted network, behind the application tier; the authenticated client RPC surface is the in-model boundary. We endorse trusted-network-by-default. (§2/§4/§7)
  2. Default root:root: It is a documented must-change, the operator's responsibility — not a supported production posture. A report against an unchanged default password is OUT-OF-MODEL: non-default-build. (§5a/§10/§11a)
  3. TsFile / SDK boundary: Confirmed — TsFile findings route to apache/tsfile; the iotdb-client-* SDKs are out of this batch. (§2/§3)

Wave 2 — boundaries & protocols

  1. Default-enabled protocols: Only the Thrift session protocol ships enabled. REST (enable_rest_service=false) and MQTT (enable_mqtt_service=false) are opt-in. (§2/§6)
  2. Inter-node channel: Assumed to run on a trusted network. Note that the ConfigNode↔DataNode and consensus channels currently have no transport encryption. A finding that requires intercepting or modifying inter-node traffic is OUT-OF-MODEL: adversary-not-in-scope under this posture; operators are responsible for network segmentation (§10). (§4/§7/§9)
  3. TLS: Available on the client surface but off by default (enable_thrift_ssl=false; REST also supports SSL). There is no inter-node TLS today (see #5). (§5a/§9)

Wave 3 — extension execution & adversary

  1. Extension code execution: Registration is not admin-onlyUSE_UDF / USE_TRIGGER / USE_PIPE / USE_MODEL are SYSTEM-level privileges that can be GRANTed to non-admin users. We agree that server-side code execution by a principal holding one of these privileges is by-design (BY-DESIGN): granting USE_* is equivalent to granting server-side code execution, and operators must treat it as such (§10). A scan reporting "UDF/Trigger/Pipe/Model allows arbitrary code execution" is BY-DESIGN, not a vulnerability.
  2. Cluster Byzantine posture: Cluster membership is assumed fully trusted. IoTDB does not claim Byzantine fault tolerance — no safety/liveness guarantee against an authenticated-but-malicious peer. (§7/§8)

Wave 4 — properties & resource line

  1. Resource/DoS line: A single expensive query, or a write flood that exhausts CPU/memory/disk, is expected and the operator's capacity-planning problem, not a bug (expected degradation / OUT-OF-MODEL). This does not waive the §8 property that malformed pre-auth or client input must yield a clean error rather than crash, OOM, or hang the server — that remains in scope and VALID. (§8/§11a)
  2. No additional recurring false positives beyond the §11a seed list at this time.
  3. Canonical location: Keep the model in-repo as proposed (AGENTS.md → SECURITY.md → THREAT_MODEL.md); the IoTDB PMC owns revisions. (wave-4 Q11)

One correction to §5 (clock): time-series semantics depend on timestamps, but server-side time ordering does not assume monotonic or synchronized clocks across the cluster. Please update the §5 clock assumption accordingly.

One note on the diff: the ## Security section was added to CLAUDE.md, which is correct — in this repo AGENTS.md is a symlink to CLAUDE.md, so the AGENTS.md → SECURITY.md → THREAT_MODEL.md chain resolves as intended. A one-line note in the PR description about the symlink would save future reviewers a double-take.

With these confirmed, ~9 of the §14 items can move from (inferred) to (maintainer). Happy to iterate on the wording directly on the branch.

@potiuk
Copy link
Copy Markdown
Member Author

potiuk commented Jun 4, 2026

Thanks @JackieTien97 — exactly the confirmation pass the v0 needed. All eleven points plus the two corrections are folded into the branch; responding individually:

Wave 1

  1. Deployment posture — confirmed; §2/§4/§7 already carried this from @HTHou's pass, now reinforced. §14.1 resolved.
  2. Default root:root — confirmed OUT-OF-MODEL: non-default-build; §5a/§10/§11a/§13 unchanged, §14.2 resolved.
  3. TsFile / SDK boundary — promoted §3 + §14.3 from (inferred) to (maintainer): TsFile findings route to apache/tsfile; iotdb-client-* SDKs out of this batch.

Wave 2
4. Default-enabled protocols — confirmed (Thrift on; REST/MQTT off); already maintainer-tagged, §14.4 resolved.
5. Inter-node channel — the big open one. Rewrote §4 (secondary boundary + reachability rule), §6, §9, §14.5: inter-node assumed trusted-network, no transport encryption today, interception/modification findings → OUT-OF-MODEL: adversary-not-in-scope, operators own segmentation (§10). Now (maintainer).
6. TLS — §5a + §9 + §14.6: client Thrift SSL off by default, no inter-node TLS today. Promoted.

Wave 3
7. Extension code execution — confirmed BY-DESIGN for a principal holding USE_UDF/USE_TRIGGER/USE_PIPE/USE_MODEL; §9/§11a/§13 unchanged (maintainer from @HTHou), §14.7 resolved.
8. Cluster Byzantine posture — rewrote §7 + §14.8: membership fully trusted, no BFT claim, no safety/liveness guarantee against an authenticated-but-malicious peer. Promoted.

Wave 4
9. Resource/DoS line — confirmed, incl. the carve-out: malformed/pre-auth/client input causing crash/OOM/hang stays VALID (§8); ordinary expensive queries/write-load are operator capacity. §14.9 resolved.
10. No additional false positives — noted; §14.10 resolved (no §11a additions).
11. Canonical location — confirmed in-repo (AGENTS.md → SECURITY.md → THREAT_MODEL.md), PMC owns revisions; §14.11 resolved, §1 status updated.

Corrections

  • §5 clock — fixed: server-side time ordering does not assume monotonic/synchronized clocks across the cluster. Now (maintainer).
  • AGENTS.md symlink — added a note to the PR description: AGENTS.md is a symlink to CLAUDE.md here, so the ## Security section landing in CLAUDE.md is intentional and the chain resolves.

With your pass and @HTHou's folded in, §2–§13 are PMC-confirmed and §14 is fully resolved — residual (inferred) tags are limited to low-stakes environmental details. Re-requesting your review; happy to iterate on any wording.

Comment thread THREAT_MODEL.md

- "UDF/Trigger/Pipe/Model can run arbitrary code" — by-design server-side execution gated by the grantable `USE_UDF`/`USE_TRIGGER`/`USE_PIPE`/`USE_MODEL` system privileges (§9); the principal holding the grant is trusted for that capability. Not a finding. *(maintainer — HTHou)*
- "Default password is `root`" — operator must-change-before-production per §5a/§10; `OUT-OF-MODEL: non-default-build`, not a code bug in itself. *(maintainer — HTHou)*
- "Server reachable on the public internet / no TLS by default on the wire" — operator deployment responsibility (§9/§10); client Thrift SSL is off by default and public exposure is a non-supported posture (§3). *(maintainer — HTHou)*
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially with respect to CRA requirements of deploying "secure per default" this might cause issues in the future, not sure we should exclude it. I would much more be in favor of enabling TLS per default and having users actively disable it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chrisdutz, that is a fair point, especially from a future CRA / secure-by-default perspective.

One practical question: if IoTDB enables TLS by default, that also means every deployment needs usable certificates/keys or keystore/truststore material at startup. My understanding is that those certificates and private keys should be provided or generated by the operator/deployment environment, not shipped by the IoTDB project itself.

So I wonder what “TLS enabled by default” should mean concretely here:

  • should IoTDB fail to start until the operator provides proper TLS material?
  • should IoTDB generate a self-signed certificate automatically for first startup?
  • or should this be tracked as a future hardening / CRA-readiness item, while the threat model documents the current state: TLS is available but off by default, and operators are responsible for enabling it with their own certificates/keys?

I agree we should not lose the secure-by-default concern. I just want to avoid implying that the project can safely provide universal default certificates/keys, or that enabling TLS without operator-provided trust material gives the intended security property.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants